-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
adding pandas.api.typing.aliases and docs #61735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are to make these public, what is the process of making changes to them?
doc/source/reference/aliases.rst
Outdated
.. currentmodule:: pandas.api.atyping.aliases | ||
|
||
The typing declarations in ``pandas/_typing.py`` are considered private, and used | ||
by pandasdevelopers for type checking of the pandascode base. For users, it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by pandasdevelopers for type checking of the pandascode base. For users, it is | |
by pandas developers for type checking of the pandas code base. For users, it is |
This also occurs more times below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in next commit
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -83,6 +83,7 @@ Other enhancements | |||
- Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`). | |||
- Added half-year offset classes :class:`HalfYearBegin`, :class:`HalfYearEnd`, :class:`BHalfYearBegin` and :class:`BHalfYearEnd` (:issue:`60928`) | |||
- Added support to read and write from and to Apache Iceberg tables with the new :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` functions (:issue:`61383`) | |||
- Certain aliases from :py:mod:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not advertising where they come from.
- Certain aliases from :py:mod:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`) | |
- Many type aliases are now exposed in the new submodule :py:mod:`pandas.api.typing.aliases` (:issue:`55231`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in next commit
Axes, | ||
Axis, | ||
ColspaceArgType, | ||
CompressionOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many type aliases here where it is not clear what method(s) they are appropriate for. E.g. it would be wrong to use this for DataFrame.to_parquet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to cover that in the docs, without getting too specific. I can make the docs more specific, although there are cases where the aliases are used in lots of methods, so the list can get quite long. E.g., for CompressionOptions
, I said "Argument type for compression
in many I/O output methods" .
Open to suggestions as to how to better document this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only resolution I see is to introduce more aliases, e.g. ParquetCompressionOptions
and CsvCompressionOptions
. This would be my preference, but I can understand if there is an aversion to this.
In any case, if we deem something to be not "sufficiently good" I think we should refrain from releasing something new. That is my take on some of the aliases here, but I won't block if I'm alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current aliases follow what's in the code. So in your example, right now the type for compression
in to_parquet()
is str | None
, while for to_csv()
it is CompressionOptions
. If we improve the typing in the code, then we can improve it here by introducing new aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't those improvements be made prior to making them public?
My suggestion would be that if someone adds an alias to |
@Dr-Irv - my question is about how do we go about changing the definition of aliases that we have already made public, not about adding new aliases. |
We just edit |
And break user code without warning? Can we introduce such breakages in minor or patch releases? While most breakages I would expect to be of a type-checking nature and therefore an annoyance, type-hints can be enforced in runtime and changes in this regard can introduce runtime breakages as well. |
I am pretty sure we can change the definition of an alias without breaking user code, unless people do introspection on those aliases, which is not a supported usage of aliases anyway. For example, let's say we implement a new sorting algorithm and change If we deleted or renamed an alias, then user code could potentially break. But at least my observation has been (by getting alerts to when anyone makes PRs that change The renaming issue probably exists for everything in |
Or remove or rename an existing sorting algorithm?
I think you're saying we don't support the enforcement of pandas type-aliases at runtime (e.g. use with Pydantic), is that right? Is this documented?
That's fine, but I'm -1 here until we have a plan that is documented about how we would do so if such a case were to come up. I'm very flexible on what that plan could be, but there needs to be a plan.
These are public classes and need to go through the usual deprecation cycle if we were to remove or rename. |
by pandas developers for type checking of the pandas code base. For users, it is | ||
highly recommended to use the ``pandas-stubs`` package that represents the officially | ||
supported type declarations for users of pandas. | ||
Note that the definitions and use cases of these aliases are subject to change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is implying that they are subject to change without any user notice. If that is the case, can this be made more explicit and put in a .. warning::
box. Perhaps something like
... are subject to change without notice in any major, minor, or patch release of pandas.
I would also be okay with only saying major or minor
; it seems okay to me saying we can promise not to make changes in patch releases.
So if we were to change the runtime allowable string for a sorting algorithm, e.g.,
The code is inconsistent. Sometimes we check that the arguments are of the right possible values, sometimes we don't. But it is not related to the aliases themselves. My sense is that we shouldn't document this at all. We say that the aliases are for type checking.
I think we have to treat them like we do other code changes. Not sure where to document that.
So we can do that if we decide to rename or delete an alias, right? |
Also worth mentioning that @simonjayhawkins suggested making this "experimental" in #55231 (comment) although I'm not sure that's the right word here. I think the warning you suggested cover this, and I have added that in the most recent commit. |
I do not think this is possible. To my knowledge we have no process to warn users of the upcoming change to a type alias. This is unlike other parts of the pandas code where we can emit deprecation warnings, put behaviors behind flags, and the like. Happy to be wrong here; to make this explicit could you detail how we'd go about adding or removing a case to
A large part of the community is also enforcing type-hints at runtime, e.g. via Pydantic. It seems to me if we are going to make these public, we should not handcuff users by disallowing this kind of usage. |
I don't think we have to notify in this case.
Yes, but I don't think you can enforce |
For example - you can't call >>> from pandas._typing import ArrayLike
>>> ArrayLike
typing.Union[ForwardRef('ExtensionArray'), numpy.ndarray]
>>> import numpy as np
>>> arr=np.array([1,2,3])
>>> isinstance(arr, np.ndarray)
True
>>> isinstance(arr, ArrayLike)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Condadirs\envs\pandasstubs\lib\typing.py", line 1260, in __instancecheck__
return self.__subclasscheck__(type(obj))
File "C:\Condadirs\envs\pandasstubs\lib\typing.py", line 1264, in __subclasscheck__
if issubclass(cls, arg):
TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union So these only have value in type declarations. |
from pydantic_settings import BaseSettings
from pandas._typing import ArrayLike
class Foo(BaseSettings):
x: ArrayLike
Foo(x=np.ndarray([1, 2])) # Succeeds
Foo(x=1) # ValidationError |
I’m without laptop for 2 weeks and on a plane about to take off but I’m pretty sure the type checkers would also flag this as an error. I wouldn’t expect people to use the aliases without type checking turned on. So the error above would be caught before runtime, I.e. by the type checkers. So if we assume people importing an alias would type check their code before executing it, then we should be fine. I’m fine to put in the docs something that explains that if you think that helps. |
pandas/tests/test_api.py:TestApi.test_api_typing_aliases()
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This is my first proposal for adding the typing aliases that are "public" so that people do not import from
pandas._typing
.